Members
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Analyzing and Reasoning on Heterogeneous Semantic Graphs

SPARQL Template Transformation Language

Participants : Olivier Corby, Catherine Faron-Zucker, Raphaël Gazzotti.

In the continuation of our work on the design of the STTL SPARQL Template Transformation Language [18], we showed that it can be used as a constraint language for RDF and we applied our approach to implement the semantics of OWL 2 profiles, each viewed as a set of constraints to be validated: we defined an STTL transformation to represent each of the three OWL 2 profiles (OWL RL, OWL QL and OWL EL). The application of one of these STTL transformations to an ontology (in OWL/RDF syntax) enables users to validate it against the OWL 2 profile this transformation represents. This work has been presented at the RR 2016 conference [34].

Exposing Heterogeneous Data Sources on the Web of Linked Open Data

Participants : Catherine Faron-Zucker, Franck Michel.

While the emerging Web of Data continuously grows as data sets are published as Linked Open Data, data is produced ever faster in data silos where it often remains locked. In particular, NoSQL systems have gained a remarkable success during recent years. Consequently, harnessing the data available in NoSQL databases to populate the Web of Data, and more generally achieving RDF-based data integration and SPARQL querying of NoSQL databases, are timely questions.

Together with Johan Montagnat (I3S, UNS), we previously proposed a generic mapping language, xR2RML, able to describe the mapping of most common types of databases into an arbitrary RDF representation [78]. In the continuation of this work, we developed a two-step approach to execute SPARQL queries over heterogeneous databases based on the xR2RML mapping of the database to RDF. We demonstrated the effectiveness of this approach by providing SPARQL access over MongoDB, the popular NoSQL document store. This work was undertaken in the context of the PhD of Franck Michel, and was published in the WebIST 2016 conference [43], and in the DEXA 2016 conference [44].

Combining Argumentation Theory and Natural Language Processing

Participants : Serena Villata, Valerio Basile, Elena Cabrio, Andrea Tettamanzi, Tom Bosc.

We have proposed a new approach to text exploration combining argumentation theory and natural language processing. They define bipolar entailment graphs, i.e., graphs whose nodes are text fragments and the edges represent the entailment or non entailment relations. They adopt abstract dialectical frameworks to define acceptance conditions for the nodes such that the resulting framework returns us relevant information for the text exploration task. The results of this research have been published at the ICAART conference [33].

Moreover, we have proposed a new approach to argument mining for Twitter data. The proposed approach consists first in detecting argumentative tweets from a stream of tweets, and second, starting from this set of argument-tweets, in predicting the relations, i.e., attack and support, holding between two argument-tweets. The annotated corpus resulting from this research line has been described in a paper published at the LREC conference [30], while the results of the argument mining task have been published at the COMMA conference [31].

Following a novel research direction, we investigated the relationship between the emotions displayed by the participants to our experiments and the sentiment expressed in the natural language of their arguments. We ran state-of-the-art sentiment analysis software on the transcriptions of the debates and compared the result with the output of the emotion reading systems. The results of our analysis were presented at the Artificial Intelligence and Cognition Workshop [26] and at the Italian Conference on Computational Linguistics [23].

Finally, together with Celia da Costa Pereira (UNS) and Mauro Dragoni (FBK, Italy), we have proposed an opinion summary application built on top of an argumentation framework, used to exchange, communicate and resolve possibly conflicting viewpoints in distributed scenarios. They show how this application is able to extract relevant and debated opinions from a set of documents containing user-generated content from online commercial Web sites. The result of this research has been published as a short paper at the IJCAI conference [35], and an extended version has been submitted to the AI Comm. journal and it is currently under review.

Opinion Mining

Participants : Andrea Tettamanzi, Serena Villata.

Together with Célia da Costa Pereira of I3S and Mauro Dragoni of FBK, Trento, who visited our team for three months from April to June 2014, we have proposed DRANZIERA, an evaluation protocol for the evaluation of multi-domain opinion mining methods [36] and an argumentation framework for opinion mining [35].

SMILK - Automatic Generation of Quizzes through Semantic Web Technologies

Participant : Oscar Rodríguez Rocha.

The research work focuses on the automatic generation of quizzes using Semantic Web technologies. It takes inspiration from the existing research works about automatic generation of multi choice questions from domain ontologies and aims to apply such existing techniques and contibute to its extension, in order to semantically generate statements that allow to describe the content of a given Web ontology. This research work is carried out in the context of SMILK. SMILK (Social Media Intelligence and Linked Knowledge) is a joint laboratory (LabCom, 2013-2016) between the Wimmics team and the Research and Innovation unit of VISEO (Grenoble). Natural Language Processing, Linked Open Data and Social Networks as well as the links between them are at the core of this LabCom. The purpose of SMILK is both to develop research and technologies in order to retrieve, analyze, and reason on textual data coming from Web sources, and to make use of LOD, social networks structures and interaction in order to improve the analysis and understanding of textual resources. Topics covered by SMILK also include: use of data and vocabularies published on the Web in order to search, analyze, disambiguate and structure textual knowledge in a smart way, but also to feed internal information sources; reasoning on the combination of internal and public data and schemes, query and presentation of data and inferences in natural formats.

Event Identification & Tracking

Participants : Amosse Edouard, Elena Cabrio, Nhan Le Thanh.

In the past year, we have been working on approaches for detecting, classifying and tracking events on Twitter. In the context of social media, an event is considered as "An occurrence causing change in the volume of text data that discusses the associated topic at a specific time. This occurrence is characterized by topic and time, and often associated with entities such as people and location". This definition shows that Named Entities (NE) play a key role in events on social medias and particularly on Twitter. In our approaches we exploit the NE in tweets to analyse events on Twitter.

Event Identification and Classification

We developed an approach that exploit occurrences of Named Entities in tweets to train a supervised model for two purposes:

We combined techniques from Natural Language Processing, Linked Open Data and Machine Learning to build a supervised model for classifying tweets. More specifically, we replaced the NE in tweets by their related class in ontologies (e.g DBpedia or YAGO) and used the modified content to train machine learning algorithms (e.g. SVM, Naive Bayes and Neural Network). Our experiments on two gold standard datasets shown that the NER mechanism helped in reducing overfitting on the output of classifiers.

Event Tracking

More recently, we started to work on an approach for tracking planned events on Twitter. In this work, we were particularly interested in tracking the evolution of existing events over time. For example, important actions in a soccer game (goal, yellow/red cards). We proposed an unsupervised approach based on NE in tweets and graph analysis to process the Twitter stream in real time. In this approach, we dynamically update a local gazetteer with actors involved in the events such as player and team names as well as terms that describe the actions of interests (e.g. goal, yellow card for football). The preliminary evaluations are quite promising since we are able to track the most important events in a soccer game as well as the player or teams involved in the actions.

Software and Hardware Architecture of EMOTICA: an Emotions Detection System

Participant : Nhan Le Thanh.

This work is performed with Chaka Kone (3rd year PhD student - LEAT, UNS) and Cécile Belleudy (Thesis Director - LEAT, UNS). The aim of this PhD work is to propose a complete low power system for the recognition of emotions satisfying all application constraints such as energy consumption, size and positioning of sensors. To achieve this goal, our work focuses on two main axes: the detection of emotions and the architectural exploration of objects communicating for health, with particular emphasis on the energy consumption of such systems.

Conversational Agent Assistant

Participants : Raphaël Gazzotti, Catherine Faron-Zucker, Fabien Gandon.

This CIFRE PhD Thesis is performed in collaboration with SynchroNext, a company located in Nice. As part of this thesis, we will be interested in setting up an ECA (Embodied Conversational Agents) for FAQs to advisers. The ECA will need to integrate a question and answer system to address the most common issue types without human intervention [76], [81]. For this purpose, it must be able to understand the questions asked in natural language by the users and to reason with the knowledge acquired. Beyond such a system of questions and answers, the ECA must be able to reopen the conversation with the Internet user according to the nature of his requests or the sequence of questions formulated. The objective is to reduce the dropout rate of Internet users on FAQs and to reduce the number of incoming calls and e-mails. This will enable to customer advisers to focus on more difficult questions.